This analysis applies the Bioconductor package fastseg to segment chromosomes based on numeric variable, such as DNA copy number and fold change of RNA transcription. Please refer to package manual for full package description. In summary, the fastseg package implements a fast and efficient segmentation algorithm, which is based on the cyber t-test (Baldi and Long, 2001). Segments identified by the algorithm are then summarized and compared to segments derived from randomized data, in terms of their frequency, length, size, and mean of the numeric variable (copy number, fold change, etc.).

 

Go to project home

1 Description

1.1 Project

Genetic Modifiers in Trisomy 21 Leukemogenesis.

1.2 Data

GEO public data set. RPKM and log2FC values are download from GSE55504.

1.3 Analysis

Log2-fold change between one pair of monozygotic twins (T2N_Rep0 vs. T1DS_Rep0). Goal is to regenerate figure 1a in the original paper.

2 Results

2.1 Summary

Parameters:

  • Variable name: log_rpkm_T2N_Rep0

  • Data columns
    • variable: 34
    • chromosome: 1
    • start: 2
    • end: 3
    • strand: 0
  • Randomization
    • round: 10
    • chromosome: 1
  • Runtime options
    • minSeg: 3
    • type: 1
    • alpha: 0.1
    • delta: 5
    • squashing: 0
    • cyberWeight: 10
  • Segment selection
    • size: 1
    • min: 0
    • max: Inf
    • negpos: 0
    • top: 3

Table 1. Brief summary of inputs and outputs.

Description Value
Total number of loci 13130
Total number chromosomes 23
Range of values -2.69857 to 3.942649 (mean=-0.04061339)
Number of segments 83
Length of segments 6 to 939 (mean=158.1928)
Size of segments 6 to 939 (mean=158.1928)
Mean of segments -1.457923 to 3.446533 (mean=0.1998589)

2.2 Segmentation

Figure 1. Global view of segmentation across all chromosomes (in alternative colors). Red lines indicate segment locations. Click here to download figures by individual chromosomes.

Figure 2. Distribution of log_rpkm_T2N_Rep0: original values at all individual loci vs. segment means.

2.3 Segment selection

Selection of significant segments using given criteria.

Table 2. Summary of selected segments: location, length, size, and log_rpkm_T2N_Rep0 at individual loci. Click links to get full list of loci within each segment and visualization of segmentation via Manhattan plot.

chromosome start end length size mean minimum maximum variance loci segmentation
segment_1 chr1 11869 1297157 1285289 42 1.8237 -2.9879 7.7200 2.8884 table figure
segment_2 chr1 1309110 151138424 149829315 939 0.1195 -3.3195 9.7265 2.3538 table figure
segment_3 chr1 151138498 161332984 10194487 167 1.6856 -3.2377 9.3922 2.4026 table figure
segment_4 chr1 161334521 249231242 87896722 388 -0.4992 -3.3104 7.4969 2.1171 table figure
segment_5 chr2 217730 73302747 73085018 277 -0.2305 -3.3173 9.4369 2.5590 table figure
segment_6 chr2 73300510 85824736 12524227 57 1.2180 -3.1117 10.6000 2.6672 table figure
segment_7 chr2 85825671 217071026 131245356 420 -0.6368 -3.3177 10.5159 2.1969 table figure
segment_8 chr2 217122588 223809357 6686770 56 1.0739 -2.9800 8.5067 2.5086 table figure
segment_9 chr2 223782105 239077541 15295437 55 -0.4999 -3.0342 6.8650 2.1584 table figure
segment_10 chr2 239072633 242708231 3635599 38 0.7064 -2.9968 5.5398 2.1500 table figure
segment_11 chr3 3168600 48542259 45373660 211 -0.3001 -3.3113 7.6221 2.2196 table figure
segment_12 chr3 48555117 52740048 4184932 106 1.3100 -3.2285 8.0900 2.6327 table figure
segment_13 chr3 52738971 197766105 145027135 431 -0.3900 -3.3180 9.3609 2.3013 table figure
segment_14 chr4 53179 8442450 8389272 67 0.8188 -3.0207 6.4850 2.1142 table figure
segment_15 chr4 8437867 190884359 182446493 294 -0.8525 -3.3022 8.8853 2.1151 table figure
segment_16 chr5 140373 172462448 172322076 458 -0.2548 -3.3203 8.9192 2.3591 table figure
segment_17 chr5 172571445 178510538 5939094 44 1.5140 -3.2982 7.0060 2.9276 table figure
segment_18 chr5 178537852 180699168 2161317 21 0.7783 -2.7938 7.1790 2.9103 table figure
segment_19 chr6 142272 30032686 29890415 164 -0.6928 -3.3009 9.4131 2.2918 table figure
segment_20 chr6 30034486 34712450 4677965 132 1.8960 -3.1708 10.4208 2.6767 table figure
segment_21 chr6 34725183 170893780 136168598 397 -0.2978 -3.3127 10.0683 2.5031 table figure
segment_22 chr7 182935 75115548 74932614 314 -0.1027 -3.3166 10.3223 2.4028 table figure
segment_23 chr7 75123401 97528427 22405027 58 -0.8639 -3.3109 10.4613 2.9438 table figure
segment_24 chr7 97576299 99753567 2177269 40 0.7271 -3.0820 5.9881 2.3753 table figure
segment_25 chr7 99752043 102184211 2432169 55 1.2903 -2.9637 9.7112 2.9970 table figure
segment_26 chr7 102178365 158622944 56444580 229 -0.0874 -3.2913 9.8935 2.5625 table figure
segment_27 chr8 163251 144380231 144216981 318 -0.5191 -3.3051 7.7725 2.1373 table figure
segment_28 chr8 144386554 145583036 1196483 41 2.0250 -2.4528 7.4860 2.9499 table figure
segment_29 chr8 145577795 146281416 703622 27 1.2915 -3.0405 9.4505 3.1229 table figure
segment_30 chr9 14511 130210909 130196399 403 -0.4408 -3.3189 9.7481 2.3412 table figure
segment_31 chr9 130209953 132484875 2274923 70 1.8107 -3.2933 6.0126 2.0579 table figure
segment_32 chr9 132500610 136522435 4021826 51 0.7512 -3.3005 10.5682 2.7385 table figure
segment_33 chr9 136528682 140764468 4235787 75 1.7690 -3.3116 9.4640 2.7151 table figure
segment_34 chr10 180405 74114988 73934584 222 -0.6039 -3.2854 11.3180 2.1665 table figure
segment_35 chr10 74127098 79689582 5562485 49 0.2808 -3.2488 4.0006 2.1597 table figure
segment_36 chr10 79729008 97416463 17687456 75 -0.4830 -3.1955 6.8894 2.2848 table figure
segment_37 chr10 97423153 104498951 7075799 77 0.7405 -3.2661 10.7612 2.5309 table figure
segment_38 chr10 104503727 135516024 31012298 123 -0.9591 -3.2428 5.9188 1.9194 table figure
segment_39 chr11 127115 64139687 64012573 339 1.1132 -3.2791 9.5353 2.7856 table figure
segment_40 chr11 64532078 66104311 1572234 61 3.6046 -2.3831 10.1488 2.5706 table figure
segment_41 chr11 66104804 134135749 68030946 289 0.3603 -3.2895 10.7152 2.6900 table figure
segment_42 chr12 73725 52585784 52512060 231 0.0779 -3.2991 11.5443 2.6470 table figure
segment_43 chr12 52626304 58019934 5393631 99 1.8729 -2.7807 8.9971 2.4733 table figure
segment_44 chr12 58017193 133684130 75666938 302 0.0833 -3.3154 10.4853 2.4093 table figure
segment_45 chr13 19271143 103528345 84257203 172 -0.9196 -3.3218 7.0206 2.1967 table figure
segment_46 chr13 107028911 111373421 4344511 13 0.8679 -2.5207 6.2899 2.6676 table figure
segment_47 chr13 111530887 114116670 2585784 11 -0.5090 -3.1502 4.7364 2.4184 table figure
segment_48 chr13 114110134 115092796 982663 10 0.9738 -2.1502 5.8303 2.6760 table figure
segment_49 chr14 20724717 21945132 1220416 17 0.9554 -1.5697 5.5552 1.8794 table figure
segment_50 chr14 21944756 24910540 2965785 64 1.6351 -3.2699 6.9609 2.1987 table figure
segment_51 chr14 24908972 103049514 78140543 264 -0.4405 -3.3166 9.4866 2.2408 table figure
segment_52 chr14 103058998 106445233 3386236 38 1.2308 -2.8784 6.0088 2.1180 table figure
segment_53 chr15 20587869 72672051 52084183 235 -0.2409 -3.2961 10.1781 2.6687 table figure
segment_54 chr15 72766667 76005189 3238523 35 1.0646 -3.2145 6.0009 2.3475 table figure
segment_55 chr15 76135622 85682376 9546755 50 -0.3636 -3.2050 6.7080 2.1414 table figure
segment_56 chr15 85923802 91506349 5582548 33 1.1943 -3.1240 7.5430 2.4806 table figure
segment_57 chr15 91509270 102516768 11007499 19 -0.9622 -3.3060 3.2547 1.8334 table figure
segment_58 chr16 64043 70285833 70221791 477 0.9841 -3.3210 9.2451 2.4721 table figure
segment_59 chr16 70286293 84220669 13934377 60 -0.4343 -3.3201 3.9921 1.8463 table figure
segment_60 chr16 84511681 90114181 5602501 63 1.2604 -3.0606 8.7869 2.5420 table figure
segment_61 chr17 254326 78451643 78197318 763 0.8129 -3.3143 12.2915 2.6102 table figure
segment_62 chr17 78518619 81052864 2534246 58 2.4043 -2.5788 11.4107 2.8363 table figure
segment_63 chr18 158383 45457515 45299133 79 -0.7207 -3.2021 7.7021 2.2806 table figure
segment_64 chr18 46065417 47920543 1855127 8 0.9350 -1.2943 3.7130 1.7185 table figure
segment_65 chr18 48405419 77905406 29499988 44 -0.9610 -3.2691 5.2022 1.6438 table figure
segment_66 chr19 197124 19312678 19115555 396 1.6494 -3.2818 9.8794 2.5766 table figure
segment_67 chr19 19312218 36036218 16724001 59 -0.1551 -3.2600 4.4556 2.0650 table figure
segment_68 chr19 36031640 50980010 14948371 277 1.5884 -3.2356 11.0569 2.4361 table figure
segment_69 chr19 50979657 54635140 3655484 46 -0.6264 -3.2650 7.0602 3.0636 table figure
segment_70 chr19 54641444 56729146 2087703 42 1.6853 -2.8288 6.4297 2.5702 table figure
segment_71 chr19 56879468 59111168 2231701 59 -0.5435 -2.7289 7.9051 2.2058 table figure
segment_72 chr20 251504 4040760 3789257 56 0.9768 -3.2685 8.9238 2.6908 table figure
segment_73 chr20 4101627 19804587 15702961 46 -0.7004 -3.1188 3.6852 1.6596 table figure
segment_74 chr20 19867165 62731996 42864832 305 0.5528 -3.2053 10.9322 2.3615 table figure
segment_75 chr21 11180920 45182188 34001269 100 -0.4948 -3.2444 9.7711 2.1644 table figure
segment_76 chr21 45192393 46221934 1029542 14 1.2117 -3.3131 5.1460 2.2903 table figure
segment_77 chr21 46225532 46646478 420947 6 0.8318 -1.6462 3.4426 2.1148 table figure
segment_78 chr21 46683843 47679304 995462 14 2.5220 -2.2295 9.1911 3.1769 table figure
segment_79 chr21 47655047 48111157 456111 8 -0.3383 -2.6454 1.8682 1.4941 table figure
segment_80 chr22 16122720 51239737 35117018 407 0.4632 -3.2973 10.3852 2.4520 table figure
segment_81 chrX 220013 152865500 152645488 421 0.1594 -3.2978 10.8871 2.5949 table figure
segment_82 chrX 152869952 153719016 849065 32 2.6375 -3.2218 10.1131 3.0795 table figure
segment_83 chrX 153733350 154688276 954927 17 0.0335 -3.2665 5.8921 2.6093 table figure

2.4 Randomization

Repetitively use the same criteria to identify and select segments from 10 sets of randomized data and compare the summary statistics of selected segments.

Table 3. Means of summary statistics of segments identified and selected original data vs. multiple sets of randomized data: number of loci, segment length, mean and standard deviation of log_rpkm_T2N_Rep0 of segments. If mean log_rpkm_T2N_Rep0 of selected segments can be both positive and negative, their absolute values are used in this table.

size length mean variance
original 158.1928 35301623 0.92 2.4152
random_1 59.1441 13102549 0.68 0.8142
random_2 57.5877 12668005 0.68 0.8420
random_3 64.0488 14184075 0.70 0.8490
random_4 61.6432 13616218 0.70 0.8017
random_5 76.7836 17051403 0.68 0.8363
random_6 65.3234 14364656 0.67 0.8387
random_7 58.0973 12853151 0.69 0.8366
random_8 58.3556 12899710 0.75 0.8600
random_9 64.0488 14032478 0.71 0.8151
random_10 61.3551 13552694 0.68 0.8291

Figure 3. Relationship between segment size and segment mean log_rpkm_T2N_Rep0. Each dot represents a segment derived from the original real data (blue) and randomized data (grey).

Figure 4. Distribution of segment size compared between original and randomized data.

Figure 5. Distribution of segment length compared between original and randomized data.

Figure 6. Distribution of log_rpkm_T2N_Rep0 mean of segments compared between original and randomized data.

Figure 7. Distribution of log_rpkm_T2N_Rep0 standard deviation of segments compared between original and randomized data.

4 Appendix

Check out the RoCA home page for more information.

4.1 Reproduce this report

To reproduce this report:

  1. Find the data analysis template you want to use and an example of its pairing YAML file here and download the YAML example to your working directory

  2. To generate a new report using your own input data and parameter, edit the following items in the YAML file:

    • output : where you want to put the output files
    • home : the URL if you have a home page for your project
    • analyst : your name
    • description : background information about your project, analysis, etc.
    • input : where are your input data, read instruction for preparing them
    • parameter : parameters for this analysis; read instruction about how to prepare input data
  3. Run the code below within R Console or RStudio, preferablly with a new R session:

if (!require(devtools)) { install.packages('devtools'); require(devtools); }
if (!require(RCurl)) { install.packages('RCurl'); require(RCurl); }
if (!require(RoCA)) { install_github('zhezhangsh/RoCAR'); require(RoCA); }

CreateReport(filename.yaml);  # filename.yaml is the YAML file you just downloaded and edited

If there is no complaint, go to the output folder and open the index.html file to view report.

4.2 Session information

## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] xlsx_0.6.1            vioplot_0.3.0         zoo_1.8-4            
##  [4] sm_2.2-5.6            fastseg_1.28.0        Biobase_2.42.0       
##  [7] GenomicRanges_1.34.0  GenomeInfoDb_1.18.1   IRanges_2.16.0       
## [10] S4Vectors_0.20.1      BiocGenerics_0.28.0   DEGandMore_0.0.0.9000
## [13] snow_0.4-3            htmlwidgets_1.5.1     DT_0.15              
## [16] kableExtra_0.9.0      awsomics_0.0.0.9000   yaml_2.2.1           
## [19] rmarkdown_1.10        knitr_1.20            RoCA_0.0.0.9000      
## [22] RCurl_1.95-4.11       bitops_1.0-6          devtools_2.3.1       
## [25] usethis_1.6.1        
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.2             pkgload_1.0.2          jsonlite_1.6.1        
##  [4] viridisLite_0.3.0      assertthat_0.2.1       highr_0.7             
##  [7] xlsxjars_0.6.1         GenomeInfoDbData_1.2.0 remotes_2.2.0         
## [10] sessioninfo_1.1.1      pillar_1.4.6           backports_1.1.5       
## [13] lattice_0.20-38        glue_1.3.2             digest_0.6.25         
## [16] XVector_0.22.0         rvest_0.3.2            colorspace_1.4-1      
## [19] htmltools_0.4.0        pkgconfig_2.0.3        zlibbioc_1.28.0       
## [22] scales_1.1.1           processx_3.4.2         tibble_2.1.3          
## [25] ellipsis_0.3.0         withr_2.2.0            cli_2.0.2             
## [28] magrittr_1.5           crayon_1.3.4           memoise_1.1.0         
## [31] evaluate_0.14          ps_1.3.2               fs_1.3.2              
## [34] fansi_0.4.1            xml2_1.2.0             pkgbuild_1.1.0        
## [37] tools_3.5.1            prettyunits_1.1.1      hms_0.4.2             
## [40] lifecycle_0.2.0        stringr_1.3.1          munsell_0.5.0         
## [43] callr_3.4.3            compiler_3.5.1         rlang_0.4.5           
## [46] grid_3.5.1             rstudioapi_0.11        crosstalk_1.1.0.1     
## [49] tcltk_3.5.1            testthat_2.3.2         R6_2.4.1              
## [52] rprojroot_1.3-2        readr_1.3.1            desc_1.2.0            
## [55] rJava_0.9-11           stringi_1.2.4          Rcpp_1.0.5

END OF DOCUMENT